3,934 research outputs found
Single View Modeling and View Synthesis
This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments.
In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm.
Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work.
In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video
Phase Retrieval with Random Phase Illumination
This paper presents a detailed, numerical study on the performance of the
standard phasing algorithms with random phase illumination (RPI). Phasing with
high resolution RPI and the oversampling ratio determines a unique
phasing solution up to a global phase factor. Under this condition, the
standard phasing algorithms converge rapidly to the true solution without
stagnation. Excellent approximation is achieved after a small number of
iterations, not just with high resolution but also low resolution RPI in the
presence of additive as well multiplicative noises. It is shown that RPI with
is sufficient for phasing complex-valued images under a sector
condition and for phasing nonnegative images. The Error Reduction
algorithm with RPI is proved to converge to the true solution under proper
conditions
Task Driven Generative Modeling for Unsupervised Domain Adaptation: Application to X-ray Image Segmentation
Automatic parsing of anatomical objects in X-ray images is critical to many
clinical applications in particular towards image-guided invention and workflow
automation. Existing deep network models require a large amount of labeled
data. However, obtaining accurate pixel-wise labeling in X-ray images relies
heavily on skilled clinicians due to the large overlaps of anatomy and the
complex texture patterns. On the other hand, organs in 3D CT scans preserve
clearer structures as well as sharper boundaries and thus can be easily
delineated. In this paper, we propose a novel model framework for learning
automatic X-ray image parsing from labeled CT scans. Specifically, a Dense
Image-to-Image network (DI2I) for multi-organ segmentation is first trained on
X-ray like Digitally Reconstructed Radiographs (DRRs) rendered from 3D CT
volumes. Then we introduce a Task Driven Generative Adversarial Network
(TD-GAN) architecture to achieve simultaneous style transfer and parsing for
unseen real X-ray images. TD-GAN consists of a modified cycle-GAN substructure
for pixel-to-pixel translation between DRRs and X-ray images and an added
module leveraging the pre-trained DI2I to enforce segmentation consistency. The
TD-GAN framework is general and can be easily adapted to other learning tasks.
In the numerical experiments, we validate the proposed model on 815 DRRs and
153 topograms. While the vanilla DI2I without any adaptation fails completely
on segmenting the topograms, the proposed model does not require any topogram
labels and is able to provide a promising average dice of 85% which achieves
the same level accuracy of supervised training (88%)
Stick-Breaking Policy Learning in Dec-POMDPs
Expectation maximization (EM) has recently been shown to be an efficient
algorithm for learning finite-state controllers (FSCs) in large decentralized
POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often
converge to maxima that are far from optimal. This paper considers a
variable-size FSC to represent the local policy of each agent. These
variable-size FSCs are constructed using a stick-breaking prior, leading to a
new framework called \emph{decentralized stick-breaking policy representation}
(Dec-SBPR). This approach learns the controller parameters with a variational
Bayesian algorithm without having to assume that the Dec-POMDP model is
available. The performance of Dec-SBPR is demonstrated on several benchmark
problems, showing that the algorithm scales to large problems while
outperforming other state-of-the-art methods
Comparative analysis of binocular summation of pattern visual evoked potential before and after the surgery of concomitant strabismus
AIM: To investigate the opportunity of the concomitant strabismus operation and the function in the treatment of strabismic amblyopia through analyzing the changes of binocular summation of pattern visual evoked potential(P-VEP)before and after the surgery of concomitant strabismus. <p>METHODS: In this retrospective study we investigated 67 cases admitted in our hospital. All patients were less than 18a and the postoperation squint angle was less than ±10<sup>△</sup>. Patients were divided into three groups according to the strabismus type, age, and amblyopia degree. P-VEP binocular summation response was recorded in all cases, to observe the changes of the binocular summation response of P-VEP before strabismus surgery and 1mo, 3mo after surgery. The P-VEP response of binocular /monocular(B/M)ratio was taken as an evaluation index. <p>RESULTS: B/M value of three groups all improved obviously 1mo after surgery, which the difference showed statistical significant(<i>P</i><0.01). 1)After 3mo surgery, B/M value in esotropia group was higher than that in exotropia group(<i>P</i><0.05). 2)After 3mo surgery, B/M value in ≤6a group was higher than that in >12a group(<i>P</i><0.05). 3)After 1mo surgery, B/M value in severe amblyopia group was higher than that in mild group(<i>P</i><0.05). After 3mo surgery, B/M value in severe amblyopia group was higher than that in mild group significantly(<i>P</i><0.01). <p>CONCLUSION: Concomitant strabismus surgery is suggested to be performed before 6 years old when the patients are difficult to improve the vision after amblyopia treatment, especially with the severe amblyopia and esotropia(accommodative esotropia must be excluded). The early operation is better to amblyopia treatment and binocular vision recovery
- …